Skip to content

[fix][broker] Use compatible Avro name validator to allow '$' in schema record names#25193

Merged
mattisonchao merged 8 commits intomasterfrom
fixes.compatible.avro
Feb 24, 2026
Merged

[fix][broker] Use compatible Avro name validator to allow '$' in schema record names#25193
mattisonchao merged 8 commits intomasterfrom
fixes.compatible.avro

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Jan 29, 2026

Motivation

After #24617 upgraded Avro to a newer version, the default Schema.Parser()
uses UTF_VALIDATOR which rejects the $ character in record names. This
breaks Protobuf schemas whose Avro representation contains $ in generated
nested type names (e.g. inner classes).

Modifications

  • Introduce a CompatibleNameValidator that allows $ in addition to letters,
    digits, and underscores, matching the previous Avro behavior.
  • Apply the custom validator to all broker-side Schema.Parser() instances
    that handle Protobuf schemas:
    • StructSchemaDataValidator
    • SchemaRegistryServiceImpl
    • AvroSchemaBasedCompatibilityCheck
  • Add DataRecord.proto for test reproduction.
  • Add unit tests for the CompatibleNameValidator and Protobuf schema
    compatibility.

Note: The client-side SchemaUtil already uses NameValidator.NO_VALIDATION.
A shared solution across broker and client (e.g. factory in pulsar-common)
can be addressed in a follow-up.

Verifying this change

  • Unit tests added for CompatibleNameValidator (valid names, invalid names,
    edge cases, error messages).
  • Integration test with ProtobufSchema<DataRecord> to reproduce the original issue.
  • CI checks pass.

Documentation

  • doc-not-needed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a regression introduced by the Avro upgrade (1.12.0) where Protobuf-derived Avro schemas can fail parsing due to $ appearing in generated record names, by introducing a more permissive Avro NameValidator and adding tests to cover the scenario.

Changes:

  • Add a custom Avro NameValidator (CompatibleNameValidator) to allow $ in Avro record/field names during schema validation.
  • Add unit tests for the validator’s behavior and a reproduction test using a generated Protobuf schema.
  • Introduce a new Protobuf message (DataRecord.proto) used by the tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java Uses a custom Avro NameValidator to accept $ during schema parsing/validation.
pulsar-broker/src/test/java/org/apache/pulsar/broker/service/schema/validator/SchemaDataValidatorTest.java Adds tests for the new validator and a Protobuf-based reproduction.
pulsar-broker/src/main/proto/DataRecord.proto Adds a Protobuf schema used to generate Avro with nested-type $ names for testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mattisonchao mattisonchao marked this pull request as ready for review February 20, 2026 01:53
@mattisonchao mattisonchao reopened this Feb 20, 2026
@mattisonchao mattisonchao changed the title [fix][schema] Illegal character '$' in record [fix][broker] Use compatible Avro name validator to allow '$' in schema record names Feb 20, 2026
@mattisonchao mattisonchao requested a review from lhotari February 20, 2026 13:11
…ed in tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.63%. Comparing base (45def39) to head (246583a).
⚠️ Report is 53 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25193      +/-   ##
============================================
+ Coverage     72.09%   72.63%   +0.53%     
- Complexity     2303    34426   +32123     
============================================
  Files          1956     1959       +3     
  Lines        154938   155418     +480     
  Branches      17670    17731      +61     
============================================
+ Hits         111706   112882    +1176     
+ Misses        34233    33546     -687     
+ Partials       8999     8990       -9     
Flag Coverage Δ
inttests 25.58% <50.00%> (-0.36%) ⬇️
systests 22.35% <60.00%> (-0.13%) ⬇️
unittests 73.62% <100.00%> (+0.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...vice/schema/AvroSchemaBasedCompatibilityCheck.java 83.78% <100.00%> (ø)
...oker/service/schema/SchemaRegistryServiceImpl.java 80.92% <100.00%> (+0.57%) ⬆️
...ce/schema/validator/StructSchemaDataValidator.java 84.44% <100.00%> (+7.77%) ⬆️

... and 186 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mattisonchao mattisonchao merged commit 3cd3893 into master Feb 24, 2026
53 checks passed
@mattisonchao mattisonchao deleted the fixes.compatible.avro branch February 24, 2026 16:35
lhotari pushed a commit that referenced this pull request Feb 24, 2026
@lhotari lhotari added this to the 4.2.0 milestone Feb 24, 2026
Comment on lines +55 to +56
Schema.Parser parser =
new Schema.Parser(StructSchemaDataValidator.COMPATIBLE_NAME_VALIDATOR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check the JsonSchemaCompatibilityCheck.java which still use Schema.Parser fromParser = new Schema.Parser(); in isAvroSchema check.

    private boolean isAvroSchema(SchemaData schemaData) {
        try {

            Schema.Parser fromParser = new Schema.Parser();
            fromParser.setValidateDefaults(false);
            Schema fromSchema = fromParser.parse(new String(schemaData.getData(), UTF_8));
            return true;
        } catch (Exception e) {
            return false;
        }
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll fix it too. :))

mattisonchao added a commit that referenced this pull request Feb 24, 2026
…bilityCheck

Apply the same COMPATIBLE_NAME_VALIDATOR to JsonSchemaCompatibilityCheck's
isAvroSchema() method to allow '$' in schema record names, matching the
fix already applied to AvroSchemaBasedCompatibilityCheck, SchemaRegistryServiceImpl,
and StructSchemaDataValidator in #25193.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mattisonchao added a commit that referenced this pull request Feb 24, 2026
…bilityCheck

Apply the same COMPATIBLE_NAME_VALIDATOR to JsonSchemaCompatibilityCheck's
isAvroSchema() method to allow '$' in schema record names, matching the
fix already applied to AvroSchemaBasedCompatibilityCheck, SchemaRegistryServiceImpl,
and StructSchemaDataValidator in #25193.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants